Feature-based Clustering of Web Data Sources
نویسندگان
چکیده
منابع مشابه
Feature-based Clustering of Web Data Sources
The proliferation of web data sources increasingly demands the integration of these sources. To facilitate the integration process, a pre-analysis step is required to classify and group data sources into their correct domains. In this paper, we propose a feature-based clustering approach for clustering web data sources without any human intervention and based only on features extracted from the...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملClustering Structured Web Sources: A Schema-Based, Model-Differentiation Approach
The Web has been rapidly “deepened” with the prevalence of databases online. On this “deep Web,” numerous sources are structured, providing schema-rich data– Their schemas define the object domain and its query capabilities. This paper proposes clustering sources by their query schemas, which is critical for enabling both source selection and query mediation, by organizing sources of with simil...
متن کاملVision-Based Deep Web Data Extraction for Web Document Clustering
The design of web information extraction systems becomes more complex and time-consuming. Detection of data region is a significant problem for information extraction from the web page. In this paper, an approach to vision-based deep web data extraction is proposed for web document clustering. The proposed approach comprises of two phases: 1) Vision-based web data extraction, and 2) web documen...
متن کاملthe clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance
با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...
ذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Applications
سال: 2012
ISSN: 0975-8887
DOI: 10.5120/9685-4127